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ABSTRACT 


This thesis investigates the implementation and evaluation of commercial off-the- 
shelf (COTS) voice recognition as an input interface within a windows-type environment. 
The three software packages implemented and evaluated are DragonDictate For Windows 
version 1.3, VoicePilot 2.0 (both manufactured by Dragon Systems, Inc.), and IN° Voice 
Command for SPARCstation version 2.2.2 by Command Corp.  VoicePilot and 
DragonDictate are both installed on PCs running MS Windows 3.1, and IN? is installed on 
a SPARCstation running OpenWindows 3 and SunOS 4.1.3. Several applications are 
manipulated using voice recognition with these three software packages. The results of 
this study show that DragonDictate has the most flexibility and ease of use as an input 
device for a windows-type environment. It is also shown that as usage increases, 
DragonDictates recognition accuracy is able to be improved to above 98 %. Other areas 


of future research are also suggested. 








TABLE OF CONTENTS 


I. INTRODUCTION. ss a eee l 
A. VOICE RECOGNITION AND C412... l 
B. THE VOICE RECOGNITION INDUSTRY ............0......0cceeecceeeeecece sees 2 
1, The Magket..............:.....s:015.0000a ne 2 
2. Commercial Vendors and Uses of Voice Recognition...................... 3 
C. CURRENT DOD INVOLVEMENT IN VOICE RECOGNITION............. 4 
1. Advantages and Disadvantages. .....................:cccccccceseseeseseerteccceeeeees 5 
D. SUMMARY vies... ssnsusasnns so 0nddleuemeestre tates sacae: Shas 2s .c 5.0. 8 
Il. AN INTRODUCTION TO VOICE RECOGNITION ......0.0..0..o eee 1] 
A. THE BASICS OF VOICE RECOGNITION .....000.0.0..0.ccceeccceeeeeeeeees i 
1. Speaker-Dependent (SD) Vs. Independent (SI) ........000000000. 12 
2. Discrete Vs. Continuous Recognition...............0000cccccccceeeeeeeeeeeeeees 13 
3 VoCapulanyASIZe «. ....i.0660.505. sess eeen oe 13 
4_ Dictation Vs, Navigation Software’.....................l@ sme. ------ 14 
III. DRAGONDICTATE FOR WINDOWS VERSION 1.3 .........cccccccccsseseesectesseteeeees 17 
A. DESGIRBRAIIONE... 09:25 209... Pcs eaten «dic eee 17 
1... Installation.and Setting Up... ee. re eee 17 
2. [T@ining ........ eee we MMM ee cs cain picts Reali ange 20 
3 Using@PraganDictate .....2..:......:.0.0csstesstis.o:cescue ee eee aN 
B,. EVALUATION,OF DRAGONDICTATE:.lita.......). see eee 23 
I Dictationwevaltati@ reper s.+-.:.cc:sciccleeee ee meee eee a5 
2. Naigati@nml vali ati@inem..............ccccccccccccccccccssccossessodesussecstneoeseedecees 26 
INTE Va scree a ae ade edt cals aia ise ee ee, ee 26 
Ae V ECHOING SMW ARR Bos ooo nu edeeon bucnttied deceensaeecbeet ategesisaleucecice cb ceeccececneesees 2H 
A. MICROSOFT WINDOWS VOICE PILOT 2.0 0.0.00... ceeeeeeeee 2H 
1. Installing and Setting Up Voice Pilot..............0..0. cece ceeseeteeeeeeee 7 
2. Di aieinoe vy OLCE CMO, sos sc te eit 3, cane Baecaisda See Renae ieee eee 29 
B. IN* VOICE COMMAND FOR SPARCSTATION. ...........0:cc:ccccceseeseeeeee 31 
1. Installing IN’ for SPARCSstation...........ccccccceccccccesessesesseseeeseeeeeseeees 32 
2. Using IN’ Voice Command ..0..........ccccccececsesesessseseeeesescsseseeeeseseseaees a7 
Se Building: Templates... Ec. cccccatenccciscncesasceaceccesdeccsseseeceeneesee 33 
4. Adding And Editing Commands .....0............cccccccccceeeceeeeeeetteeeeeeeeeees 34 
Cg TV AAEM CHIN rrr aa es aa suduneceiecacessa vevessis idle scdesstvanseanaex®s- 38 
Me oiceRlOtmViersion 20 isd... 5.... 2S cece cecsecdalidiciigicsssss--cenedeedneedoeeee 38 
a. Adding Vocabularies and Commands ........................00:eee 40 
D: BAS@ FO MW Seen rir Rhee tne 10 Sosa nen eee 4] 
2 lIN@Vaice @cmimandy nts 1.2 se eee ee ee ore ee 42 
a. Adding New Vocabularies and Commands. ........................06+. 43 
Be SWIMENUAIR Yee cs eee ee eS tea IE eae 44 
Ve CONCLUSIONS AND RECOMMENDATIONS =< Gees ok ee ee 47 
A COINCEUUSTOIN Soe ors 2. ecvccis eens 0 2 ete aren reer 47 
B. RECOMMENDATIONS FOR FURTHER RESEARCH ........00..0.........05. 48 


Vil 


1. Voice Recognition and the Internet ....0000000.0. 48 


2. Service Area Specific Applications Of Voice Recognition ............... 48 

3. Use Of Voice Recognition Across Platforms ...................00eeeee 49 

APPENDIX A. DRAGONDICTATE VOCABULARY LIST .........0.00000..00.00000cccce. 51 
A. ALWAYS ACTIVE COMMANDS .......0........ccccccccccccccccceceessssssseeeeeeeeeesenes 51 

Fey CsI Fey ules NINN) ees 5 ee, coe e ec sccevessussecaees > 
GOWAN OW SIGORMINTAIND SS... 2 s.....-.-- MINN sec tececeeescucasesseesens ay 
DPARROWEMOVEMIENT o........c.00cecieecse sss ee seenncaechogessstecsasecscseeseesrcescessses 53 

E. DICT AMONIGONIMANDS oes i I. se eee 53 

F. MOUSE MOVEMENT COMMANDS .......000000000.cccccceecccscccceceeesteeeeeeseenees 54 

Ge sy MBOES AND PUNCTUAMON 222, Ss... ... ee Si sce 55 

H. CORRECTION COMMANDS Werte as... Oth hes oI atte doseees 56 
NUMBERS AND KE YS oi coc cvs eens AMI eee Me 57 
APPENDIX B. IN* EVALUATION VOCABULARY ..0.0.....0.0000cccsccsccscsesesteeeeee 59 
APPENDIX C. USERS GUIDE FOR DRAGONDICTATE FOR WINDOWS 1.3...... 63 
IN SIU Lie (C)) i i 63 
BeTRAINING 2 er Fs cccccsseccdins 63 

C. ADDING NON-SUPPORTED WINDOWS APPLICATIONS .................. 64 

D. ADDING VOCABULARY FOR UPGRADED APPLICATIONG............. 65 

E, CREATING NEW COMMANDS... ........ STINT... ee 65 
APPENDIX D. DICTATION TEST PARAGRAPH ...00000......cccccccccccceseteceeeeeeeeeeee: 69 
Bi Oe IR TIEN RING BG re ie loses 5 0s sess cevteen ciccd sa Oe MM Ott oc 6 ses bcccncensaeerects 71 
INTL DISTRIBUTION EIST << ere rere rreee, Oe Meitwatie... bs edksWlasns ee, 73 


vill 


Figure 1. 
Figure 2. 
Figure 3. 
Figure 4. 
Figure 5. 
Figure 6. 
Figure 7. 
Figure 8. 
Figure 9. 
Figure 10. 
Figure 11. 
Figure 12. 
Figure 13. 
Figure 14. 
Figure 15. 
Figure 16. 
Figure 17. 
Figure 18. 
Figure 19. 
Figure 20. 
Figure 21. 
Figure 22. 
Figure 23. 


LIST OF FIGURES 


Three utterances of the word “cut,” sampled at 44 kHz. oo... ..ooeeee eee 12 
Adding Av NOW USEf sec.cikssccccense5s ce eee ee 18 
Microphone Selection...................:sasdgases..--2ouibas... 1) eer 19 
Tutomal Window ...................000c.0«s cgasseetoeeseeianwe cre eee er 20 
Training cONSOle ..........-..0.:000els.ane. ces onceeeeeeenny enn nes cee 22 
Number of recognition errors performed Vs Trials.....................ccccceceeeee eee: 24 
DragenDictate Accuracyas Trials. .ciccs00 ee. jceetens.,-.1. gee ee 25 
NC CuUTAGV AV Shia t time pice sess... ce, eae 25 
WMIiCemmilOtMViG@ICCNNAZAICS ......2.......-.:+s0Jageeticne acerca 28 
SWACH (© OfOUD a... 5. adic ant inane ta teat meee neta 2c |< eer 2 
STRIVE VV PICO sco c ooo ss c. -cee aegis ae Ogee RRO coco sas Peon nena cee 29 
Creatingsa new vocabulary. : 2.20 ie 0. eer. ee ee. 30 
INEM Vici WikdOweer ee ee eee 5S 
Template creation in "All" mode. ...........0...0...0 0c ccc ceccc cece eseseseesceeeeeeeeeceeeeeeess 34 
Edit Command dialogue Window .............00....:ccccsscccceeceesseeeeeecsenssecesesenessaeees 37 
VoiceRilotaccuracysewie: eee eee... 38 
ETON ee) PECemi ae sy 0 eases eyes ens asses sume eee Re hak vice oh onaee 39 
Add iNew Command window 1. ccciiiesssctcc ca ee ss ee 41 
DIN: NCCU yz Oy Cit Cec caeca i lec ke ee 42 
Error Type percentages committed by IN? oo........c.ccccccccececceeesessseeesessteeeen 43 
Copy Program Item dialogue Window ...............c::cccccccccccescceeeeeeeeeeeeeeeeeeseeees 64 
Adding anew command |Sctaten |. .W.__......ceescsss.0s-5 cee eas 66 
SAN TUMNGINGYStROMES ee. 1c eT I... eee cctdernecastesteeeessoeee 67 








I. INTRODUCTION 


A, VOICE RECOGNITION AND C4I 

In the past few years the Department of Defense (DoD) has placed an emphasis on 
C4Il (Command, Control, Communications, Computers, and Intelligence) for military 
applications. An example of this is the issuance of many Service plans and directives on 
the implementation of C4I within each of the major services. C4I is the future for all the 
military services, and is playing a major role in the planning of future capabilities, makeup, 


and budgetary issues within DoD. 


To get a better look at what is expected from C4I, let us take a look at the 
infrastructure of C41. The C4I infrastructure for the Warrior is broken down into three 
major areas: the warrior terminal, the Warrior’s battlespace, and the Infosphere (a global 
military and commercial communications systems and network of information databases 
and fusion centers accessible by the warrior from anywhere at anytime [Ref. 18: p. 10]). 
We will concentrate on the Warnor terminal. The Director of C4 Systems, J-6, for the 


Joint Staff describes the Warrior terminal as follows [Ref. 18: p. 9]: 


The Warrior’s terminal is the processing equipment that will allow the Warnor to store all 
required on-site information and share information in multimedia forms among other terminals 
when required. The C4I termmal devices and their capabilities must be familiar to the 
Warrior. This requires the terminal to have “manprint” (look, touch, feel) that is recognizable 
to the user whether in the Pentagon or in the field. The terminal device may be phone size, 
wrist watch size or even smaller as technology develops. The terminal must satisfy the 
Warmior’s needs of any time, any location, and any mission. The terminal will be tailored to 
the Warrior to best assist him or her in accomplishing the mission.... 


Looking closer at the Warrior terminal, we can focus on the “manprint” and 
multimedia. In order to give the Warrior terminal a familiar look, touch and feel of a 
terminal that is easily recognized by anyone wishing to operate the terminal the interface 
between man and machine must be natural in its implementation. The natural interface for 
the machine (in our case the computer) is digital in nature. The natural mode for 


I 


communication for man is speech. To bridge this difference in forms of communication a 
device for transforming or translating speech to digital signals is required. For computers, 
voice recognition software and microphones are the obvious answer to this problem. By 
using voice recognition software, the Warnor would be able to speak to the computer and 
to have the computer process his or her commands. With voice recognition the Warrior 
will be able to navigate through the applications available on the computer and will also 


able to dictate letters, memos, directives, etc. 


This study will show examples of commercial off-the-shelf voice recognition 
software, capabilities, and implementation. Each software package will be evaluated and 
the results given in the conclusion. The software packages evaluated will be 
DragonDictate and VoicePilot by Dragon Systems, Inc., and In-Cubed (IN*) by Command 
Corp. These packages were selected for study because they did not require any 


proprietary equipment. 


B. THE VOICE RECOGNITION INDUSTRY 

1. The Market 

Voice recognition technology has made tremendous strides in the past few years. 
Several major areas of commercial applications of voice recognition are dictation, personal 
computer interfaces, inventory maintenance, automated telephone services, and special- 
purpose industrial applications. The use of voice recognition in private and public 
telephone companies is enjoying a tremendous amount of success. Voice recognition in 
telecommunications is becoming a very lucrative market, averaging 40.4% annual gain in 
the Automatic Speech Recognition (ASR) Market. The overall market for automatic 
speech recognition/voice recognition (ASR/VR) technology is expected to have an annual 


growth of about 35% up to the year 1997 [Ref. 6: p. 57]. 


2 Commercial Vendors and Uses of Voice Recognition 

Several vendors are producing voice recognition packages and application 
development products. PCvoice Inc., BBN HARK Systems Corp., Speech Systems Inc., 
Dragon Systems Inc., Kurzweil Applied Intelligence Inc., IBM, Microsoft, Voice 
Processing Corp., and Wildfire Communications Inc. have all released new voice 
recognition packages this year. Both Macintosh and IBM are releasing computer systems 
with voice recognition software included with the normal systems setup. WordPerfect 
Corporation has teamed with Dragon Systems to develop voice controlled word 
processing software and other Windows-based software. This influx of voice recognition 
software and applications is an indication that voice recognition is becoming more popular 
as an interface device as the technology improves. Already on the market are voice- 
activated controls for videocassette recorders, televisions, cellular phones that dial a 
number when the user speaks the name of a person, and multimedia games, training 


programs, and educational applications that respond to voice commands. 


IBM Personal Dictation System has overcome a lot of the hurdles faced by all 
recognition software: recognition accuracy, command decoding speed, and vocabulary 
size. It boasts a 95 to 98% recognition accuracy, which is about one mistake out of 
every 20 words spoken. It is able to handle up to 90 words a minute; average speaking 
speed in a normal conversation is 80 words per minute. It comes with a 60,000-word 
vocabulary that is customizable to incorporate job-oriented jargon. The vocabulary is 
also expandable: with user-defined words it is able to accommodate up 82,000 words. 
IBM retails this product for $995. This price includes the proprietary card marketed by 
IBM. 


For command and control systems there are many options available from the 
aforementioned vendors that exhibit remarkable accuracy, speed, and vocabulary size for 


commercial needs. The HARK Recognizer immediately comes to mind. Dr. Phillip F. 


Carrigan, marketing director at UFA Inc., a developer of air traffic control simulation 


systems, states that [Ref. 9: p. 9]: 


The HARK Recognizer is the most mature, stable and robust speaker- 
independent product available... We depend on HARK products to handle 
the complex task of moving simulated aircraft in response to spoken 
commands... 

Telecommunication technology is leading the way in the use of voice 
recognition technology. Telephone services are boasting a projected savings of hundreds 
of millions of dollars. AT&T and Sprint already offer voice recognition-controlled 
services. Sprint even offers voice activated phone cards. 

Ce CURRENT DOD INVOLVEMENT IN VOICE RECOGNITION 

The Department of Defense has begun to incorporate voice recognition into some 
or its information systems. In comparison to many major civilian organizations that have 
incorporated voice recognition into their information system technology, the DoD is not 
very far behind the level of implementation in industry. | Many companies have 
successfully integrated voice recognition into their security systems, word processing 
packages, and even in their telecommunications. AT&T already boasts on their television 
commercials that they will be bringing technology that will allow you access to your home 
via voice. “Smart” homes are being built that will turn on the stereo, start dinner, or even 
turn on any other appliance by voice command. Using computer control, one can do these 


things over the telephone lines from a remote location. 


Currently the United States Air Force Rome, NY, Laboratory and three affiliated 
labs are developing systems that automatically identify individual speakers and the 
language being spoken [Ref. 7: p. 57]. Monitoring of enemy radio signals and enhanced 
analysis of aircraft accidents are two applications also being developed by the USAF. 


Other military applications being explored are smart cockpits, allowing the pilot to orally 


instruct a computer to take a selected course of action rather than flipping a switch; and in 
command and control, to orally instruct a computer rather than use a keyboard [Ref. 7: p. 
57]. The United States Navy is currently developing an Aegis Combat Information Center 


system that would be operated using voice commands. 


1. Advantages and Disadvantages 

By automating data input and retrieval using voice recognition, the DoD would be 
able eliminate the need for many administrative types that do most of the data retrieval and 
input used by the current manual systems. Improved telecommunication service and 
information systems interfaces are in keeping with improving DoD information systems 
(IS) technology. With the migration of call control from private branch exchanges (PBX) 
to the computing environment as computer telephony integration (CTI) evolves, the need 
for voice recognition software will increase as call centers’ role diminishes [Ref. 11: p. 
51]. 

The cost for a viable voice recognition system is very small in comparison to the 
benefits of implementing the system. A typical commercially available system for 
command and control can range from less than a $100 one-time cost (for systems such as 
Microsoft’s Sound System for Windows, Creative Lab’s VoiceAssist, and Covox’s 
Speech Blaster) to more than $10,000 annually (for systems such as BBN HARK 
Systems’ Recognizer 2.0 Developers Toolkit and technical support from BBN HARK). 
With decreasing costs and increasing processor power of the newest personal computers, 
the costs of voice recognition software are decreasing. IBM, VERBEX, Kurzweil 
Applied Intelligence Inc., and other voice recognition software\hardware development 
industries are cutting prices for their product by as much as 50%. _Alll of these systems 
support Windows (version 3.x and eventually Windows ‘95), OS/2, DOS, and UNIX 
Operating systems on IBM compatible PCs, Sun workstations, Hewlett Packard and 


Silicon Graphics platforms. Initial investment would be minimal for implementation 


throughout the DoD and its service components. The only requirement would be for the 
acquisition and implementation of the software\hardware required for the actual voice 
recognition system. Most deployed and shore-based DoD assets have access to or are 


already on IBM or UNIX systems. 


Included in the cost would be a minimum of twenty to thirty minutes lost 
productivity while personnel “enroll” in the discrete user-dependent and some speaker- 
adaptive systems. Enrolling entails a training period in which the user inputs spoken 
commands into the software i order to build the library of statistical models. The IBM 
Personal Dictation System, for example, requires the user to read a Mark Twain short 
story in order to “learn” the user’s speech patterns. This time period would not be 
necessary for most continuous, speaker-independent systems, which allow the user to start 


giving voice commands immediately after installation. 


Computer manufacturers are proceeding in their development with the assumption 
that speech will become an important component of the computer interface [Ref. 5: p. 54]. 


Near-term opportunities in voice recognition include: 


1. Speech as a shortcut. Rather than opening a file by traversing many levels of 
hierarchy with multiple key strokes, the user just has to say “Open budget.” 
An even timelier example is “Open the address book and call my barber.” By 
incorporating intelligence and macros into the voice recognition software that 
it is possible to gain greater flexibility. 


2. Hands busy/eyes busy environments are easily adaptable to voice recognition 
systems. An air traffic controller could give commands to his computer while 
steadily scanning his equipment and the skies. Inventory managers, and 
weapons and ammunition control officers could simply speak into a portable 
system instead of carrying multiple sheets of inventory and ammunition 
records. Roving watch standers who take readings on machinery and 
soundings from tanks could simply speak into a portable system and cut their 
roving time by a third. 


3. Portability. Once a user is enrolled in a particular system, he could simply 
download his file and upload it into another system that utilizes the same or 
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compatible voice recognition software. A person could be transferred to many 
duty stations and never have to re-enroll on a voice recognition system. 


The Naval Postgraduate School thesis by Earl Hill and Leo Kotowski further lists 


the advantages of voice recognition and separate them into three categories: engineering, 


psychological, and physiological [Ref. 10: pp. 35-38]: 


A. Engineering 


1. 


Advantages 

a) Can be faster than other [input] modes. 

b) Can be more accurate than other [input] modes. 

Cc) Compatible with communications systems (telephone). 

d) Can reduce manpower requirements. 

Disadvantages 

a) Possible interference from noise, distortions, and competing 
talkers. 

b) Physical conditions (vibrations and physical onentation of 
speaker) may change speech patterns. 

Cc) No permanent record of speech (unless explicitly recorded). 

d) Microphones needed for speech input, and acoustic 


speakers needed for speech output. 


B. Psychological 


if 


Advantages 

a) Most natural form of human communication. 

b) Best for group problem solving. 

C) Universal among humans. 

d) Can reduce visual information overload. 

e) Increases in value when person is engaged in complex 
thought processes. 

Disadvantages 

a) Speech is not private; others may eavesdrop. 

b) Psychological changes (stress) may change one’s speech 
characteristics. 

C) Speech synthesis may interfere with other aural indicators. 


. Physiological 


ie Advantages 
a) Requires less effort and motor activity than other [input] 
modes. 


b) Frees the hands and eyes. 


Cc) Permits multimodal operation. 

d) Feasible in darkened area. 

e) Is omnidirectional; does not require direct line of sight 
between user and ASR system. 

f) Permits operator mobility. 

g) Contains information on identity and emotional state of the 
speaker. 

h) Contains information on the physical state of the speaker. 

1) Simultaneous interaction with man and machines. 

Zz. Disadvantages 


a) Prolonged speaking may cause fatigue, which may in turn 
change speech characteristics. 


b) Illness may change speech characteristics. 


Studies have been performed both at the Naval Postgraduate School and by others 
that demonstrate and support the definite advantages of speech input over other currently 
available forms of input. These include reports on the effects of stress and changing 
environments on the user of various recognition systems (most of these were performed 
by the late Gary K. Poock, formerly a professor with the Systems Management 
department at the Naval Postgraduate School), the effect of feedback on users of ASR 
equipment, and the effects of various background noises on ASR systems recognition 
capabilities. 

D. SUMMARY 

Organizations using speech technology properly can enjoy enormous savings. The 
US Postal Service, for instance, projects that it will save $30 million by using a voice 
recognition system for mail sorting [Ref. 8: p. 52]. AT&T reportedly could save as much 
as $100 million annually by using speech recognition technology to replace up to 17,000 


human operators, the company has already used the technology to eliminate 2,000 


operators [Ref. 8: p. 52]. The DoD could achieve similar savings by utilizing voice 
recognition technology in its information systems. It would eliminate the need for most 
Personnelmen and Yeomen and other administrative rates since it would require fewer 
personnel to maintain computer-based records and to dictate letters and memos. Most 
Commanding Officers and Department Heads could dictate and send their own messages 


and letters using voice recognition technology. 


Many of the disadvantages connected to voice recognition and its usage as a 
means of data input can be overcome by engineering and/or controlling the environment. 
Many of the physiological advantages work toward easing the stress and fatigue on the 
user enabling him to become more effective and versatile in a C4I environment. _This 
thesis will cover the implementation and evaluation of three voice recognition software 
packages currently available commercially. The evaluation will cover their usage in a 


windows type environment within their respective required operating systems. 








II. AN INTRODUCTION TO VOICE RECOGNITION 


A. THE BASICS OF VOICE RECOGNITION 

Voice recognition (VR), also called Automatic Speech Recognition (ASR), is the 
ability of speech software and hardware to convert spoken words into text or commands. 
Voice recognition requires the use of an analog to-to-digital (A/D) converter with the 
remaining computations (using a complex algorithm) taking place on a general-purpose 
computer. Voice recognition systems match a transform of incoming speech against a 
representation stored in some form of permanent memory [Ref. 5: p. 2]. A recognizer will 
make use of acoustic models that capture phonetic or word-level properties of speech and 
often a statistical model that captures the syntactic and semantic regularities of language in 
a particular domain [Ref. 5: p. 52]. Most leading technologies use a Hidden Markov 
Model (HMM) algorithm, or a Neural Network/Hidden Markov Hybrid System. The 
neural Network/Hidden Markov Hybrid System is used to improve inaccuracies in the 


HMM that are caused because [Ref. 13: 6/12/95] 


... traditional HMMs make some false assumptions, e.g., that speech features occurring at 
one time are uncorrelated, and independent of other recently occurrmg features (even ten 
milliseconds earlier). SRI has developed a hybrid neural network/hidden Markov model speech 
recognizer that improves the accuracy of traditional HMM by modeling correlations among 
simultaneously occurring speech features and between current and recent features. Future 
work involves modeling longer-term correlations, using better basic speech features, and 
integrating higher-level linguistic constraints. 


Voice recognition systems are categorized along a number of standard dimensions. 
Where a system falls in these dimensions strongly determines a system’s capabilities. 
These dimensions are speaker-dependent or speaker independent, dictation or navigation 


software, continuous or discrete recognition, and small or large vocabulary. Normal 
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human speech is continuous, with an unlimited vocabulary, and speaker independent, but 


in many applications none of these characteristics is required [Ref. 12: p. 35.4]. 


I Speaker-Dependent (SD) Vs. Independent (SI) 

A speaker-dependent system is trained to a particular voice, whereas an 
independent system is able to recognize the speech of many different individuals without 
training. Also available are speaker-adaptive systems that operate as SI systems but adapt 
to the speech patterns of an individual with more use, with a concomitant increase in 
recognition accuracy. Speaker independent systems are difficult to produce because of the 
differences in accent, pitch, inflection, etc. Thus most commercially available systems are 


speaker-dependent. 


The type of training needed for significant accuracy in speaker-dependent systems 
requires the user to repeat each word a number of times. This is especially true for the 
systems with small vocabularies. The speaker-dependent system then uses this 
information to create a model of the word and incorporates a variability factor that 


accounts for slight changes in pronunciation for each utterance (Figure 1). 





Figure 1. Three utterances of the word “cut,” sampled at 44 kHz. (X-axis is time in 
seconds.) 


12 


De Discrete Vs. Continuous Recognition 

The recognition type determines if a user needs to separate individual words by 
short silences. Discrete Recognition or independent word recognition ( IWR) systems are 
easier to implement because the system knows the exact extent of the word and can use 
this information to improve decoding accuracy. Continuous recognition is far more 
difficult since there are extremely small or no break at all between the utterances of words 
in a particular phrase. This makes it extremely difficult for the software to correctly 


decode the words in the phrase. 


3. Vocabulary Size 

The System is able to better recognize a word if the vocabulary is very small. This 
is because there are fewer alternative words from which the system has to choose. The 
vocabulary size also determines the choice of algonthm and the details of implementation 
[Ref. 5: p. 54]. Most small vocabulary software contains about 1000 words in their 


vocabulary. Larger systems handle anywhere from about 20,000 to 70,000 words. 


Many of the commercially available small vocabulary systems handle several 
vocabularies. They do this by loading the individual vocabulanes of the applications that 
it can control. It arranges the vocabularies in a tree-structured fashion. The words or 
commands that are used to start or end each application are stored in the root of the 
structure. When the system recognizes a word that begins an application, it retneves the 
specific vocabulary for that application and makes it the active vocabulary. In a windows- 
type environment where there is multitasking, the vocabulary for the active window is 
selected. 

Large vocabulary systems require a different training mechanism. It is impractical 
to repeat thousands of words thousands of times. Large vocabulary systems do not 
recognize words in the same manner as small vocabulary systems. They base their 


recognition schemes on elements smaller than a word such as syllables and phonemes 
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[Ref. 12: p. 35.5]. Because the actual pronunciation of a particular phoneme is subject to 
the surrounding phonemes and its corresponding allophones, it is possible to use a small 
number of phonemes to represent a large number of words. Only around 40 phonemes are 


required to speak in the English language, which is over 40,000 words [Ref. 12: p. 35.6]. 


O’Shaughnessy [Ref. 14] describes in detail how phonemes work. The following 
is a brief synopsis of the basics. The articulation of a phoneme produces a physical sound 
called a Prone An infinite number of phones can correspond to any particular phoneme 
because the vocal tract can vary in an infinite number of ways. A//lophones are a class of 


phones corresponding to a specific variant of a phoneme. [Ref. 14: p. 56]. 


The ideal voice recognition system is a system that is speaker independent, 
supports continuous speech, has a very large vocabulary (about 60,000 or more words), 
and uses synthesized speech as an interface between the computer and the user. This 


ideal system is not yet realized in practice. 


4, Dictation Vs. Navigation Software 

Dictation is the process of using voice recognition as an input method when using 
word processing software. There are really two types of voice dictation systems that can 
be envisioned, differentiated by where the user’s attention is focused. In the classic voice- 
activated-typewriter case, the user is focused on both the computer and the information 
being input into the system. This enables practically immediate error correction, and the 
system is able to prompt the user for information in the case of unclear or ambiguously 
identified words. The other case is when the user has his attention focused elsewhere, 
and he is basically “thinking out loud” and the computer is capturing those thoughts. 

Navigation software, or voice command software, is used to open and close 
applications within the operating environment. It is also used to perform menu 
commands within those applications. This type of software is basically a command and 


control tool that is activated by voice. An example of this is the Microsoft Sound System 
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For Windows VoicePilot software application. This application is used within the 
Windows operating system to “navigate” by opening and closing windows compatible 
applications. It captures commands from the menus of the applications and adds them to a 


specific vocabulary which it creates for that particular application. 


1S 








Iii. DRAGONDICTATE FOR WINDOWS VERSION 1.3 


A. DESCRIPTION 

DragonDictate 1s a combined navigator/dictation software package. Version 2.0 is 
the latest version offered (available since January 1996). The particular version 
implemented is the Classic Version which uses a 30,000 word vocabulary. DragonDictate 
version 1.3 was installed on an IBM PC compatible computer, with a Pentium processor 


running at 90 Mhz, 16 MB of RAM, a Sound Blaster 16 sound card, and a color monitor. 


1. Installation and Setting Up 

Installation of DragonDictate for Windows is very simple. The instructions 
included with the software are very clear and concise. Installation of the software is 
similar to the installation of any other Microsoft Windows application. The pnmary 
diskette is inserted into the primary drive while the user is working in Windows. While in 
Windows go to the Windows program manager and click on File[Run and type 
A:\setup.exe, where “A” is whatever your primary drive is called. The installation 
program will begin and all that is required is to follow the on-screen instructions. It 1s 
recommended to preload everything when given the choice, because this will enable the 
user to add new users without having to bother with inserting any diskettes after the initial 
installation is completed. 

The entire program will require about 24 megabytes of hard disk space and about 
12 megabytes of RAM (if you plan on having more than two users). The users guide lists 
the following system requirements for installing DragonDictate Classic edition [Ref. 2, pp. 


2-3}: 
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ib One of the following sound cards: 
A. IBM° M-ACPA (M-Audio Capture and Playback Adapter) 
B. Creative Labs, Inc. Sound Blaster 16™ 
C Media Vision™ ProAudio Studio 16™ 
D. Microsoft® Windows™ Sound System 

ZF At least in IBM 486/33mhz PC or compatible computer 

3 And the following requirements for the Classic edition (30,000 words): 
A. 24MB + 9MB per user after the first user 


Bi 10.5SMB RAM which includes 3 MB of memory required by 
Windows 


3.5 inch, 1.44 MB (high-density) floppy drive 
Microsoft Windows 3.1 
MS-DOS or PC DOS, version 3.1 or higher 


Color or grayscale monitor 


ee SS 


Mouse recommended 


After installation, DragonDictate prompts the user to enter a name for the 


individual that will be utilizing the software (Figure 2). This is a required step in order for 





Name for New User: 


{Timothy West 
comet |__| 


Figure 2. Adding A New user 


the user to begin using the software. The software will create a user profile and install a 


minimum vocabulary for the particular user specified. DragonDictate will also prompt the 
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user to identify the type microphone/headset to be used in conjunction with the software 
(Figure 3). The user is allowed to select from three types of headsets which include two 
Shure models (SM10A headset, and the VR230B headset), one Dragon Systems headset 
(the Dragon/Primo headset), and a selection label “I don’t know.” The two Shure models 
are recommended because they are particularly sensitive to sound and tend to produce 
very good quality input for the software. The user is then asked to go through a tutorial 


(Fig 4.) and to perform the “Quick Training.” 





Felina Microphone 


ey Please identify what type of microphone 


you are using: 





Dragon / Primo Headset 
Shure SM10A Headset 
Shure VA230B8 Headset 
amine sil. 





cancel | __ Hob 


Figure 3. Microphone Selection 


[t is highly recommended to go through the tutorial. The tutorial gives the user a 
quick crash course in simple commands and dictation practice for use in DragonDictate. 
This gives the user a feel of how the software behaves and how it interacts with different 
applications including Windows notepad and calculator. After completing the tutorial, 
DragonDictate asks if you would like to do the “Quick Training.” It is recommended to 
do the quick training session at this time. This is a required step in order for 


DragonDictate to recognize your speech and it also makes DragonDictate easier to use. 
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4 Go Backward | 


|» Go Forward | 


u_ Resume Tutorial 


le Quit Tutorial = J 


Figure 4. Tutorial Window 





2. Training 

Training 1s required once you have created a user profile. The Quick Training 
Window (Figure 5) allows you to set the intensity of the training. You are also to set only 
the repetition level and to enable or disable the “Only Listen for Word Being Trained” 
selection. Total training time is about 20 minutes at the default setting (Light), but may 


take up to 90 minutes at the “Intense” setting. 


Quick Training involves training four groups of vocabulary types. These groups 
are “Correction Words,” “Common Commands,” “Dictation Words,” and Additional 
Words. All four groups are recommended to be trained but need not be completed in one 
sitting. The Quick Training session can be started, stopped, and restarted when necessary. 
Completed training is never lost once it has been done, and training is always picked up 
where you _ previously left off During training DragonDictate constantly adapts to your 
speech. This enables DragonDictate to constantly adjust the number of words required to 
be trained within each group. Thus, as training progresses DragonDictate will adjust the 


number of “Common Commands” required to be trained. This is why you may see the 
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count of words to be trained decreasing during the training session. Once training has 
been completed, DragonDictate is ready for use with any application that is Windows 


compatible. 


3; Using DragonDictate 

Before using DragonDictate, it is necessary to make sure that the microphone is 
properly adjusted for giving commands. The microphone should be situated about two 
inches away from the corner of the mouth of the user [Ref. 2: p. 22]. A headset 
microphone is recommended, optimally one of the three brands listed in the microphone 


selection dialogue box. 


To begin using DragonDictate you must make sure that the microphone/headset is 
turned on by ensuring that the microphone window on the voicebar is either gray or 
yellow. The gray color indicates that DragonDictate is in a waiting mode (asleep), and the 
yellow color indicates that DragonDictate 1s ready and listening for a command. After 
ensuring that the microphone ts turned on the user may begin to utilize DragonDictate to 
navigate Windows applications or to dictate into Windows compatible word processing 


applications. 
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Figure 5. Training console 
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B. EVALUATION OF DRAGONDICTATE 

The evaluation of DragonDictate was done in two stages. The first stage 
evaluated the dictation accuracy and learning capability of DragonDictate. The second 
stage evaluated the ease of navigation performed while working in Windows. The 
navigational ability of DragonDictate was evaluated by noting how well the software was 
able to accommodate opening and closing various Windows applications. The Windows 
applications used were Lotus 1-2-3, WordPerfect, MatLab, Netscape, Eudora, and 
Windows Program Manager. 

1. Dictation Evaluation 

The dictation and learning capability of DragonDictate were measured by dictating 
a standard passage consisting of 313 dictation words and commands into WordPerfect 
using DragonDictate. The passage was dictated six times, recording the number of 
mistakes, correcting the mistakes as they occurred (using the technique described in the 
DragonDictate User’s Guide [Ref. 1, pp. 20-28]), and the length of time required to 
complete the dictation. The errors were calculated as a fraction of the total number of 
commands to give a percentage of each error type as well as the total amount of errors. 
For this study there were four types of mistakes that could be measured, which are listed 


and described below: 


i" Type 1 - The software recognizes the wrong word or command but the 
correct word or command is located in the choice list. 

Dn Type 2 - The software recognizes the wrong word or command but the 
correct word or command ts not located in the choice list. 

Bs Type 3 - The software heard nothing even though a word or command was 
uttered. 

4. Type 4 - The software heard the correct word or command but performed 


the wrong action or did nothing. 


These measures of performance were taken against the passage in Appendix D, 
which was dictated into WordPerfect. The results are depicted in Figures 6 and 7. 
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Figure 6 shows that with each trial, the number of errors made by DragonDictate 
decreased. The number of type 2 errors decreased with each trial due to those words not 
previously listed in the choice menu becoming candidates within the selections listed in the 
choice list. Eventually these words became recognized as the primary, or first selection, 
choices in the list. This means that they became the words that were recognized by 
DragonDictate as the input words uttered by the user. The other Error types became less 
frequent also, thus contributing to the improvement on the overall errors performed by 


the software. 


DragonDictate Performance Vs Usage 
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Figure 6. Number of recognition errors performed Vs Trials 


Figure 7 demonstrates that with each use DragonDictate generally improved in its 
accuracy!. This supports Dragon Systems, Inc.’s claim that DragonDictate performance 
improves with usage. The greatest degree of accuracy reached during this evaluation was 
98.03%. This was achieved within a controlled environment where the user was able to 
control the level of background noise. During this evaluation there was very little to 
absolutely no background noise present. With some background noise (maintenance man 
drilling in the adjacent room with the door closed) DragonDictate achieved an accuracy of 


or 0" 


| Accuracy is defined as the complement of the total percentage of errors. It is 100 - the % errors. 
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DragonDictate Accuracy Vs Usage 
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Figure 7. DragonDictate Accuracy Vs Trials 


Along with the improvement of accuracy, the amount of time required to dictate 
the control passage decreased (Figure 8). As shown, with each successive use of 
DragonDictate, the length of time required to input the control passage was reduced. This 
was due to the improved level of accuracy. As accuracy improved, the user was able to 
increase the speed at which he dictated the text. Less time was expended correcting errors 
performed by the software. The longest input time was 20:35 (mm:ss) with an accuracy 


of 72.73%, the fastest input time was 9:45 with an accuracy of 98.03%. 


Input Time Vs Usage 
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Figure 8. Accuracy Vs. Input time 
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Ze Navigation Evaluation 

Navigation with DragonDictate was flawless. All that was required to ensure 
reliable navigation was that the program being controlled by voice was properly added to 
the DragonDictate program group in windows. The technique for doing this is described 
in the User's Guide [Ref. 1: p. 18]. It 1s also necessary to ensure that the program being 
controlled is placed within the group and properly named. For example, Wordperfect 6. 1 
need only be named Wordperfect, while Lotus 1-2-3 may still be named Lotus 1-2-3. 
Other non-supported programs may still be controlled by training the name of the 
program. For example, Matlab is not supported and therefore is not part of 
DragonDictate's vocabulary. It 1s therefore required that the user train this particular 
word in order to start the program by voice. However, it is not necessary to train any of 
the commands within the menus of non-supported programs. DragonDictate 1s capable of 
tracking all of the commands within the menu and many of the button controlled 
commands as well. 
C. SUMMARY 

DragonDictate performed very well as an input device for the Windows operating 
environment. As a dictation input into word processing software and in conjunction with 
Matlab it proved to be outstanding. After some continuous use the software was able to 
adapt to the user's speech patterns and was able to improve accuracy to 98.03% within a 
quiet test environment. DragonDictate maintained an accuracy of over 90% in a noisy 
environment. The noisy environment was caused by a maintenance man drilling into a 
wall adjacent to the lab in which the evaluation was being performed. As a navigational 
input for Windows it performed equally well, though more work was required by the user 
in order to ensure that non-supported program applications were able to be initiated by 


voice. This procedure is described in detail in Appendix C. 
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IV. NAVIGATION SOFTWARE 


Voice navigation software 1s basically a command and control type of application, 
as previously explained in chapter II of this paper. It allows the user to open, close, and to 
perform many menu driven commands within specific applications. The two navigation 
software packages implemented and evaluated for this study are Microsoft’s Voice Pilot 
2.0 - a part of the Windows Sound System software package, and Command Corp.’s IN ° 
Voice Command For SPARCstation. The latter will be installed on a SPARCstation 


running Sun OS 4.1.3. 


A. MICROSOFT WINDOWS VOICE PILOT 2.0 

Voice Pilot works with the Microsoft Windows 3.x operating systems. It is 
compatible with a// MS Windows compatible applications. Once installed, the application 
is fairly easy to use. It comes with several “wizards” - macros that automate or simplify 
the setup or usage of an application, which enhance its simplicity (Figure 9). These 
macros _ aid in the creation voice commands, new vocabularies, setting user preferences, 
and training voice commands. 

I. Installing and Setting Up Voice Pilot 

Implementation of Voice Pilot is quite easy. To install Voice Pilot, simply insert 
the diskette into the drive and using program manager click File|Run, then type 
A:\setup.exe, where “A” is the letter of the drive that the diskette is in, and simply follow 
the onscreen directions. The program requires a minimum of 10 megabytes of free hard 
disk space and about 2 megabytes of RAM and the following system requirements [Ref. 


yD. 1x |: 
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1. An 8 or 16 bit Sound Blaster™ compatible sound card. 
Microsoft Windows operating system version 3.1 or later. 


An 80386SX or better IBM” compatible PC operating at 25 Mhz or faster. 


oa) |! 


A Microphone/headset. 


VoiceWizards 









Ly VoiceWizards help you use your microphone like a magic wand to 


control your applications. 


Select a VotceWizard to perform. then choose Start. 


Available VoiceWizards Description ~ oe per eee 
Create Voice Command 





Helps you add a new voice 


Create Vocabulary command to a Voice Pilot 
User Preferences vocabulary and guides you through 
Voice Training training your new voice command. 


_ Exit J Start 
Figure 9. Voice Pilot Voice Wizards 

During the initial setup the user is given the chance to set up a “Switch To” group 

that is used to store the application programs that the user wishes to navigate using voice 
commands. This group appears on the desktop as another application group except that it 
has the user’s name as the title (Figure 10). This is a very important feature. The “Switch 
To” group name must match exactly. This “name matching” is how the program knows 
which user is associated with each specific “Switch To” group. The process for adding 
and removing applications within the “Switch To” group is the same as that required to 


add new program items the DragonDictate program group as described in Chapter IIL. 
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Figure 10. Switch To group 


ze Training Voice Pilot 


After completing the installation and creating the “Switch To” group, Voice Pilot 
is ready for voice training. Before any commands will be recognized voice training must 
be completed. The program includes a default vocabulary that is automatically loaded and 
trained at the beginning of voice training. Voice training is easily completed by using the 


voice training wizard included in the software. The training window (Figure 11) is very 


similar to the training window in DragonDictate, and it is very easy to use. 
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Figure 11. Training Window 
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Once you have completed voice training, you can create new vocabularies by 
Opening applications and then maximizing or opening Voice Pilot while the application is 
still opened. Voice Pilot allows you to create vocabularies by automatically extracting 
them from the open application (Figure 12). Voice Pilot will automatically create the 
vocabulary from all available menu items within the target application. Voice Pilot will 
select commands as far down as three or four levels of menu items. It also allows you to 
make any particular vocabulary a shared vocabulary or a private vocabulary. A shared 
vocabulary is available for use by any and all users that have access to Voice Pilot. A 
private vocabulary is only available to the user that created the particular vocabulary. 
Voice Pilot will then notify the user that the new vocabulary contains untrained commands 


and will allow the user to immediately train those commands. 


r a ie mee oo - eee x me ee a ee a te a te ~~ = es es 
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Figure 12. Creating a new vocabulary 


The user is allowed to select either “Quick training” or “Untrained words.” Quick 
training consists of 52 commands that are common to most Windows compatible 
programs, and are the same for all applications. “Untrained words” are nominally 72 
commands available for a specific application that have not been trained. These 
commands are extracted automatically from the available menu commands of the 


application. The number of words for a selected group of applications are listed below in 
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Table 1. Times for “Quick Training” vocabulary were the same for all applications, 7 


minutes 35 seconds. There were no errors. 









WordPerfect 
Program Manager 


Table 1. Applications trained using “Untrained words” selection. 
B. | IN° VOICE COMMAND FOR SPARCSTATION 

IN’ Voice Command by Command Corp. works under all audio-equipped 
SPARCstations using the following operating systems [Ref. 15: p. 2.]: 


1. OpenWindows 3.x 
Solaris 2.x (Sun OSS.x) . 
Solaris 1.x (Sun OS 4.1.2 or 4.1.3). 


a 


Sun OS 4.1.1. - disregard warning messages from 1!d.so that libc.so.1.6 has an 
older revision than expected. 

IN’ speech recognition technology uses voice templates created for each command 
and stores them in a lexicon. When in recognition mode, the program compares the 
templates and matches them to the input data coming from the microphone [Ref. 15: p. 6]. 
The software performs these comparisons continuously and in real time. It is for this 


reason that it is important to create these templates in a quiet environment with a strong 


2 Errors were words that required re-training due to background noise or /O errors. These words were 
identified to the user by VoicePilot. 
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voice signal. Such templates will normally be well-matched and correctly recognized in an 


environment with typical office noise. 


1. Installing IN° for SPARCstation 

Installation of IN* for SPARCstation was performed by a network administrator. 
Installation on individual SPARCstations is fully described in the user’s guide. The 
workstation itself should be audio-equipped. The workstation must have the necessary 
hardware and software installed to permit use of audio input and output. Upon 


completion of the installation IN® is ready for use. 


a Using IN* Voice Command 

Once IN? is started by using the command “in3,” the application performs a 
microphone check. The application requires that the user either opt to perform the 
microphone check, playback a sample (not available on initial use), or select continue. 
The “Mic Check” button allows the user to create a voice sample by saying the phrase 
“Sun Test.” This is repeated again several times to allow IN’ to adjust the microphone 
gain. This voice sample is then used as the playback sample. It is not necessary to 
playback the sample in order to begin using IN’, but it is a good idea to play it back so 


that the user is able to hear the quality of the input being used as a template. 


After completing the initial microphone check it is necessary to load a lexicon (set 
of commands) that becomes the active vocabulary. This is done by selecting “File” and 
then “Load Starter Lexicon.” This opens a list of available lexicon files that are provided 
with the application. There are several to choose from and include lexicons for 
OpenWindows (openw.vcb), Frame Maker (framestart.vcb), and Vi (vi.vcb). Once 
loaded the list of available commands within the selected lexicon are displayed in the main 


window (Figure 13). The window shown lists templates that have not been created. 
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Figure 13. IN° Main window. 





Many users will find that the lexicon sets provided do not provide the flexibility to 
go from one program or application to another without having to reload the proper 
lexicon. To solve this problem just include lexicons into the current lexicon. This 
increases the size of the lexicon, and thus increases the size of the available vocabulary. 
This is a feature unique to the SPARCstation version of IN*, and allows the user to have 
just a single lexicon of unlimited size. The PC version does not allow the use of just one 
large lexicon. The limit is due to the memory requirements and processing limitations of 


the PC. 


3. Building Templates 

Once the user has loaded the desired lexicon(s), it 1s necessary to train the 
commands (build templates). Building templates is quite simple. The user must select 
‘Edit,” and then “Build Templates” from the IN*° window menu. Once the “build 


Templates” dialogue is started, IN* will set either the “All” or “Selected” mode (Figure 


33 








14) depending on whether or not there are templates in the lexicon that are already 
trained. If templates exist then the “Selected” mode is set. If no templates are available 
then the “All” mode is set. In the “Selected” mode only those templates that require 
training are created. The user must then select “Create” to train templates. 
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Figure 14. Template creation in "All" mode. 


IN’ begins to train (create) templates when the “Begin” button is selected. The 
“Begin” button then becomes a “Pause/Resume” button. IN’ then begins the training 
(creation) dialogue. The user is asked to say commands (templates) several times, usually 
twice, unless a command 1s not recognized or there is a problem with the microphone 
input. IN® senses input deficiencies and then notifies the user. Should the user pause too 
long between utterances, IN* detects this as an error and notifies the user of input 
deficiencies. The user is then allowed to correct the error or continue training. After 
training each command, IN’ then goes through the entire list of words/commands trained 
and asks the user to repeat them. IN’ performs this routine to ensure proper training has 
occurred. It also catches any errors made during training and retrains the command at this 


time. The entire process for 116 commands took only 10 minutes and 33 seconds. 


4, Adding And Editing Commands 
Editing, adding, and modification of commands is performed in the Edit 
Commands window (Figure 15). This window allows the user to modify, delete, or reset 


the selected command and also to add any new commands. When adding a new 


34 








command the user begins by clicking the “Edit” menu choice, then selecting the “Edit 
Command” menu selection. This starts the “Edit Command” dialogue window. The user 
must then either type in the name of the new command or select a command listed in the 
IN* main window. The complete method for adding and editing commands are given in 
the user’s guide. The specific key combination or mouse movements and button 
presses/clicks are programmed into the “Keystrokes:” box by using the “Window/Pointer 
Probes:” macro. By selecting “Names” and then changing the focus to the desired 
application window and clicking the left mouse button, the user is able to capture the 


name of the application. By the “Packing” button, the user is able to track (capture) the 


mouse movements and button clicks. 


IN’ is aware of which commands may be executed with each particular 
application, or X-aware as described by Command Corp. This allows the user to build a 
single large lexicon containing an unlimited number of templates to control the most used 
functions and applications by voice without having to switch between lexicons. Thus tt is 
possible to have all voice commands needed by the user located in one vocabulary file. 


The user simply continues to add commands and templates to his lexicon. 


IN’ controls application startup in one of three methods: using a windows 
management mode, an application execution mode, or by using embedded commands. In 
windows management mode the command is preceded by “f'wmm” and then the command 
is typed. For example, to start the Shell tool the user would type “fwmm shelltool {CR}” 
in the “Keystrokes:” box in the “Edit Command” dialogue window. The windows 
management mode execution method will startup the Shell tool by 1) maximizing the 
shelltool if it is currently running as an icon, 2) bringing the shelltool to the front if it is 
running but is hidden under other open windows, or 3) starting the shelltool application if 


the shelltool application is not currently running. 
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Using the application execution mode, IN’ will start a new instance of the 
application even if the application is currently running. In this mode the command is 
preceded by “fexec’” and then the command is typed. Using the previous Shell tool 
example, in this mode the command would be “fexec shelltool {CR}’. This command 
would start a brand new instance of shelltool if the application were running or not 
running. 

Using embedded and conditional commands allows the user to specify the 
conditions under which an application is started. IN° has 15 recognized embedded 
commands and two conditional commands (“‘True” and “False”) that can be used together 
in many different combinations. This allows the user to have greater flexibility and control 
navigating applications. For example, {Front:t:/usr/spool/mail} is used to bring the 
Mailtool window titled “/usr/spool/mail” to the front of the display screen and to give that 
window the focus for command execution. This is very useful when using applications 
that utilize multiple windows, such as Mailtool. Mailtool opens multiple windows to view 
or to compose mail. Using the “Front” example above allows the user to execute 
commands in those windows without having to use the mouse to switch the focus to that 


particular window. 


Using the conditional commands adds even greater functionality to the voice 
commands. The combination {Front:all:bob} {False:open:bob} {False:exec:cmdtool - 
name bob} does three things. First it attempts to bring forward a window called “bob.” 
If that fails it tries to open an iconified window named “bob.” Should that fail, it starts the 


Commandtool and tells it to use the name “bob” as its resource name. 
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Figure 15. Edit Command dialogue window 
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Cc: EVALUATION 

The evaluation of both IN° and VoicePilot consisted of giving navigational 
commands and taking note of all errors that occurred. Usability and ease of training the 
vocabulary and adding commands, were also taken into consideration while evaluating 


both software packages. 


1. VoicePilot Version 2.0 

VoicePilot performed reasonably well in a moderately quiet environment. 
Moderately quiet means in this case that the environment was less quiet than that of a 
normal office. In this environment the navigational ability of VoicePilot was nowhere 
close to the level of accuracy that would be required in a noisy shipboard environment. 
Figure 16 depicts the range of accuracy of VoicePilot over a period of six trials. Using 
114 trained commands within supported programs (MS Word, WordPerfect, and Program 
Manager), VoicePilot was evaluated by actually navigating the supported Windows 


applications. The maximum accuracy reached by VoicePilot was 77.77%. Most users 
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Figure 16. VoicePilot accuracy 


would probably desire a minimum of 90% accuracy. Any less than that and it would be 


easier to do navigation by hand. 
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The errors made by VoicePilot were categorized into three types: 1) commands 
that were unrecognized or not heard by VoicePilot, 2) commands that were unable to be 
corrected within the VoicePilot correction dialogue, and 3) commands that were 
incorrectly recognized by VoicePilot which resulted in unwanted actions being performed 
by the software. The percentage of these type errors as a part of the overall a:nount of 


errors is shown in Figure 17. 
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Figure 17. Error type percentages 


The number of errors made by VoicePilot that resulted in some unwanted action 
was very high, as shown in Figure 17. Though no major setbacks were experienced, the 
potential for disaster is quite extreme. Although the majority of the errors were corrected, 
many of the commands were not able to be corrected using VoicePilots correction 
dialogue window. There appears to be no true pattern of improvement. The same 
commands can be incorrectly recognized time after time, even with corrections being 
made. Even then the same errors still occur and sometimes the word needing to be 


replaced for what VoicePilot recognized is not listed among the choices of commands. 
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a. Adding Vocabularies and Commands 

Adding new vocabularies was simple and quick with VoicePilot. All the 
user needed to do was to open the application for which the new vocabulary was to be 
used and then open VoicePilot. After opening VoicePilot, the user needed to choose the 
menu item “Vocabulary” and then choose “New Vocabulary.” Once in this dialogue the 
user need only choose the target application and to check the radio button for adding the 
new vocabulary by automatic extraction (Figure 12). | VoicePilot then extracts the 
vocabulary from the menu items of the target application and then offers to allow the user 
to conduct training for the new vocabulary of commands. The new vocabulary will be 
opened automatically by VoicePilot any time that the associated application is started 
while VoicePilot is active. 

Adding individual voice commands 1s a different series of operations. In 
order to do this the user chooses “New Voice Command” from the “Vocabulary” menu. 
VoicePilot then opens the “Add New Command” dialogue window (Figure 18). The 
User then selects the application for which the new command Is to be associated, the name 
of the new command, and the keystrokes associated with the command that are to be 
replace. This is a very easy way of creating a new command, though being able to record 
the mouse movements and then substituting them with the voice would probably be much 
easier. Not every user is going to be familiar enough with every application to know 
exactly which keystrokes perform which function. Most functions are easily accomplished 
by pressing a button on a toolbar with the mouse. The user must then train the new 


command in order for it to be recognized by VoicePilot. 
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b. Ease of Use 

VoicePilots interfaces made the program extremely “user-friendly” that 1s, 
the program was not very hard for even the novice computer user to operate. The many 
“wizards” included with the program made training and adding new vocabularies even 
simpler. The “User Preferences” wizard enabled the program to optimize its settings just 
by asking the user to say nine phrases (standard phrases that were the same each time the 
wizard was used) into the microphone/headset. The user never had to worry about 
manually setting any sound card settings or voice input levels. Though there is a manual 
setting choice, it was never used. The software will alert the user if the automatic setting 
was not able to be set and would then instruct the user to manually set the device input 


level. 
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At Top Level 
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Top level of items $1 [ Confirmation Required 
Figure 18. Add New Command window 
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2: IN’ Voice Command 

IN* Voice Command performed very well under identical environmental conditions 
as VoicePilot. IN° Voice Command was installed and operated on a SPARCstation using 
SunOS 4.1.3 and OpenWindows version 3. An Audio-Technica MT858 microphone was 
used as an input device. The microphone was very sensitive and could pick up the low 
pitched whine of the CPU cooling fan inside the SPARCstation. The user was able to 
position the microphone up to two feet away and still have a good input signal for the 


operation of IN’. 


114 commands using the vocabulary listed in Appendix B were used to evaluate 
IN°*. The accuracy of IN° was very poor during the initial use of the application. With 
continued correction of errors and refinement of the voice commands, the accuracy of IN” 
was able to be improved to 90.91%. Figure 19 shows the progressive improvement of 
accuracy with each use of IN°. Most users would feel very comfortable using IN* at 90% 
or better. With increased use and refinement, the accuracy of IN’ should be able to be 


improved to well over 90%. 
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Figure 19. IN° Accuracy over time 
The errors made by IN® were able to be categorized into three types of mistakes: 
1) the command was not heard or recognized by IN’, 2) the command was recognized but 


there was no action performed by the software, or 3) the command was recognized but the 
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wrong action was taken by the software.? As depicted in the chart in Figure 20, even in 
times of great accuracy the number of errors that resulted in some unwanted action was 
high. Though most of the unwanted actions were of a benign nature and were easily 
corrected by resetting the movements or modifying the command to perform the correct 


actions, the consequences of these unwanted actions could potentially be disastrous. 
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Figure 20. Error Type‘ percentages committed by IN” 


a. Adding New Vocabularies and Commands 

Adding new vocabularies in IN° were as simple as just opening the “File” 
menu selection and choosing the “Add lexicon,” “Add starter lexicon,” or “Include 
lexicon” selections. The “Add lexicon” selection adds a template located in the users 
directory. This template could be one of several that the user may have created or 
modified from the lexicons included with the program. The “Load starter lexicon” 


selection allows the user to select and load any one of the nine included lexicons. The 


difference between these lexicons and those that are added by the “Add lexicon” selection 


3 Wrong action included bringing up the wrong application window and/or exccuting improper mouse 
movements or clicks. 
4 Error Types depicted are a percentage of the total number of errors committed by IN° 


43 








is that these starter lexicons are not yet trained, and those loaded using the “Add lexicon” 
selection may or may not be trained. The “Include lexicon” selection allows the user to 
add vocabulary commands from different lexicons into one large lexicon, creating one 
large vocabulary file. The advantage of doing this is that the user will not have to switch 
templates when different applications are started or selected for use. 

Adding individual commands is done using the “Edit Command” dialogue 
as previously described. Learning how to use embedded commands, capturing keystrokes, 
and enabling commands to operate within specific applications is the tricky part. Learning 
the use of embedded commands 1s almost like learning a new programming language. 
The examples given in the User’s Guide are not very clear, and the User’s Guide itself 
reads more like a technical manual than a guide. It is extremely helpful if the user has 
some general or basic knowledge of UNIX or OpenWindows. Several calls were made to 
Command Corp. for technical help on how to program some of the commands, especially 
commands dealing with applications using multiple windows. The result of the technical 
help was the use of the “Front” command previously described. This technique is 
described in the IN Cube Voice Command for SPARCstation version 2.2.2 Release Notes 
that are installed in the wsr/lib/in3/info/ directory in the file “relnotes.ps’”. This document 
contains notes, changes, and corrections to the documentation included in the package 


with the software. 


D. SUMMARY 

In this chapter we have looked at the two navigational software packages 
evaluated in this study, VoicePilot and IN° Voice Command. We have seen that both 
were produced to perform the same type of operations, that is to navigate between 
applications in a windows environment. As navigational input devices for windows 
Operating system environment, VoicePilot was found to be less than desirable due to its 


low accuracy. In contrast, IN° performed well as a navigational device, reaching a 90.91 
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% accuracy rate after continuous use. IN° and VoicePilot showed that there is a 
propensity for both packages to perform unwanted actions when there is an error made in 
the recognition of acommand. This is not an attribute that any user would want. In this 
study the unwanted actions were benign, but the consequences of such error types in other 


situations could be potentially disastrous. 
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Ms CONCLUSIONS AND RECOMMENDATIONS 


A. CONCLUSIONS 

In the past few years the DoD has placed an emphasis on Command, Control, 
Communications, Computers, and Intelligence For the Warrior (C4IFTW). C4I is the 
future for all the military services, and is playing a major role in the planning of future 
capabilities, makeup, and budgetary issues within DoD. A major factor in C4I FTW is the 
interface between man and computer. One of the technologies which is “coming of age” 
is voice recognition. Within a few years (some experts say within the next ten years) 
giving “orders” or inputting data into a computer by voice may be the normal way of 
doing business. For C4IFTW, to give the computer a common look and feel so that 
interfacing with it is almost natural, one solution is to incorporate voice recognition as an 


interface between the user and the machine. 


Voice technology has made great strides within the past three to five years. 
Manufacturers are beginning to produce voice recognition packages that are ready to use 
night out of the box. Training commands and vocabularies is optional. These voice 
recognition packages are being produced to support all of the major computing platform 
operating systems. These include MS Windows (version 3.x, 95, and NT), UNIX, 
SunOS, OpenWindows 3.x, and even OS/2. With more of the computing industry 


focusing on multimedia, voice recognition is becoming a more popular technology. 


This thesis took a look at three voice recognition software packages currently on 
available in the commercial market, DragonDictate version 1.3, VoicePilot version 2.0, 
and IN* Voice Command for the SPARCstation version 2.2.2. These three packages were 
implemented on various systems and evaluated. Of these three packages DragonDictate 
was the best choice for dictation and navigation. It was shown that DragonDictate’s 
accuracy improved steadily with increased usage, maintaining an accuracy above 98 % ina 
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quiet environment, and 93.5 % accuracy in a relatively noisy environment. The accuracy 
was able to improve because DragonDictate was able to “learn” the users speech patterns, 
and apply corrections to voice commands to avoid future errors. The user needed to 
perform a twenty minute initial training, but this was the only extensive training the 
program needed. Navigational commands were not required to be trained for each 
specified application. VoicePilot and IN° Voice Command both required training for each 
application or command within each vocabulary. DragonDictate was the simplest package 
to use, as well as the most accurate in recognizing voice commands. 
B. RECOMMENDATIONS FOR FURTHER RESEARCH 

This thesis provides a preliminary study on the application of voice recognition 
technology. Following is a list of three areas dealing with applications of voice 
recognition technology (although this is clearly not an exhaustive list of possible research 


areas involving voice recognition). 


1. Voice Recognition and the Internet 

This thesis used voice recognition to automate many of the menu and button 
commands involved with software to access the Internet and the World Wide Web such as 
Netscape, Mosaic, and FTP tools. However, once connected many of the functions 
performed while “browsing” the Web were still done using the mouse. Possible research 
topics exists in the area of SLAM (Spoken Language Access to Multimedia)* and its 


possible implementation on a machine at the Naval Postgraduate School. 


Zs Service Area Specific Applications Of Voice Recognition 
Use of voice recognition in many commercial professional areas has become 


popular. Research topics can be examined in the possible application of using profession 


> Spoken Language Access to Multimedia (SLAM) is a spoken language extension to the graphical user 
interface of the World-Wide Web browser Mosaic being developed by the Center for Spoken Language 
Understanding (CSLU) at the Oregon Graduate Institute. SLAM uses the complementary modalities of 
spoken language and direct manipulation to improve the interface to the vast variety of information 
available on the Internet.... SLAM is believed to be the first spoken-language interface to the World-Wide 
Web to be easily implemented across platforms. [Ref. 16] 


48 


specific voice recognition software in the military counterpart or equivalent Warfare 
Specialty area, especially under field conditions. Many vendors are currently shipping 
special editions of voice recognition with vocabularies specifically created for the medical 


and legal professions. 


3. Use Of Voice Recognition Across Platforms 

A group of people suffering from RSI (repetitive strain injury) have utilized a2x, a 
piece of public domain software designed to interface the DragonDictate speech 
recognition system on a PC to a workstation running the X window system. Research 
could be performed at Naval Postgraduate school to utilize a2x to interface voice 


recognition on a PC to a workstation. 
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APPENDIX A. DRAGONDICTATE VOCABULARY LIST 


This is a sample list of the vocabulary words used in DragonDictate for Windows 
version 1.3. Bold typeface words tn the “command” column are the spoken command 
words (what the user says to cause the performance of a specific action). The “actions” 


column lists a brief description of what each command does or the resulting action®. 


A. ALWAYS ACTIVE COMMANDS 


These commands are available at all times. 


Command Action 

Command Mode Sets DragonDictate to Command and Control Mode 

Dictate Mode Sets Dragon Dictate to Dictation Mode 

Go To Sleep Sets DragonDictate to passive mode. The software 1s 
not listening for commands. 

Wake Up Sets DragonDictate to active listening mode. The 
software is listening for commands or dictation 
words. 

What Can I Say? Lists relevant vocabulary for the current application 

Oops Starts DragonDictate correction sequence 


B. GLOBAL COMMANDS 
These commands are available for use at all trmes except when 1) training a word, 
2) the user has already uttered Bring Up, 3) during arrow or mouse movement, or 4) 


while DragonDictate 1s 1n the passive listening mode. 


Command Action 


Bring Up Starts an Application 


6 This convention using “command” and “action” columns is used consistently in all appendices to 
denote what is uttered and the resulting action. 
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Computer Please 


Drop List 
Move Voicebar 


Type Word 


Voice Menu 


Puts DragonDictate into temporary Command and 
Control mode during dictation. 

Shows the list in a ListBox. 

Moves the position of the Voicebar from one of the four 
comers of the screen to the next in clockwise rotation. 
Begins the macro to allow for typing the word following 
the command. 


Switches to the Voice Menu dialogue. 


c: WINDOWS COMMANDS 


These commands control the attributes of the windows. Active window is used to 


denote the window possessing the focus of attention. 


Command 
Clear Desktop 
Close Window 
Maximize 
Minimize 
Move window 
Next Window 


Previous Window 


Restore 


Size Window 


Window Menu 


Action 


Clears the desktop of all open windows. 

Closes the currently active window 

Maximizes the currently active window, if not already 
maximized. 

Minimizes the currently active window, if not already 
minimized. 

Grab focus of window and allows the window to be 
positioned by voice using the mouse commands. 
Switches focus to the next open window from the current 
window. 

Switches focus to the previous open window from the 
current window 

Restores attributes of the current window from any 
change that has occurred, such as minimizing, 
maximizing, or moving. 

Allows for the changing of the size of the window by 
dragging with the mouse using the mouse commands. 
Opens the current menu of window options. 


>2 


D. ARROW MOVEMENT 


These commands control arrow movement. When the command is spoken the 


arrow begins to move as required. Stop ends the arrow movement. 


Command 


Cancel 
Down 
Faster 

Left 

Move Down 


Move Left 

Move Right 

Move Up 

Move Down 1 ... Move 
Down 5 

Move Up 1... Move Up 5 
Right 

Slower 

Stop 

Up 


Action 


Cancels current action 

Moves arrow or mouse down 

Increases the speed/rate of movement of the arrow 
Moves the arrow in the left direction. 

Moves arrow down, and also moves down a list in a list 
box. 

Moves arrow in the left direction 

Moves arrow in the night direction 

Moves arrow in the up direction 

Moves the arrow or selection in list box down 1-5 
increments. 

Moves the arrow or selection in list box up 1-5 increments. 
Moves arrow 1n the nght direction 

Slows the speed/rate of arrow movement. 

Stops arrow movement 

Moves arrow in the Up direction 


EK. DICTATION COMMANDS 


These commands are used when dictating text. 


Command Action 

Back 1 ...Back 5 Moves cursor back one to five words by specified increment. 

Begin Capitalize Types all words with first letter in uppercase. 

Begin Document Starts the document. Indents 5 spaces to begin paragraph and 
will Capitalize the first letter of the next “dictation” word 
spoken. 

Begin Lowercase Types all words dictated in lowercase 

Begin No Space Prevents DragonDictate from placing a space between words 
spacebar 

Begin Title Causes DragonDictate to use title capitalization rules. 

Begin Uppercase Types all words letters in uppercase. 


Bottom of Document Takes the cursor to the bottom of the document 
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Capitalize Next 
End Capitalize 
End Lowercase 
End No Space 
End Title 

End Uppercase 
Lowercase Next 
New Line 


New Paragraph 


No Space 

Normal Case 
Scratch That 
Scratch 2 ... Scratch 
5 

Top of Document 
Uppercase Next 


Capitalizes the First letter of the next word. 

Stops the actions taken by Begin Capitalize. 

Stops the actions taken by Begin Lowercase. 

Stops the actions taken by Begin No Space. 

Stops the actions taken by Begin Title. 

Stops the actions taken by Begin Uppercase. 

Types the next dictation word in lowercase. 

Begins a new line of text. Does not begin the line in paragraph 
format. 

Starts a new paragraph. Indents first line and capitalizes the 
first word. 

Suppresses automatic space between words. 

Types words in normal sentence case 

Deletes the last word dictated. 

Deletes the number of words stated by the numeral, 1.e., 
Scratch 2 would delete the last two words dictated.’ 

Takes the cursor to the top of the document 

Types the next diction word in all uppercase letters. 


F. MOUSE MOVEMENT COMMANDS 


These commands are used to control the mouse movements. By saying Mouse + 


the desired direction, the mouse movement is initiated. Saying Stop ends the mouse 


movement. 


Command 


Button Click 


Button Double Click 


Cancel 

Double Click 
Down 

Drag Down 
Drag Left 

Drag Lower Left 


Drag Lower Right 


Drag Right 


Action 


Clicks the Left mouse button 

Double Clicks the left mouse button. 

Stops current command 

Double Clicks the left mouse button. 

Moves mouse cursor down. 

Drags object/Window in the down direction. 

Drags object/Window in the left direction. 

Drags object/Window toward the lower left hand corner 
of the screen. 

Drags object/Window toward the lower Right corner of 
the screen. 

Drags object/Window in the right direction 


7 These commands were programmed by the author. Details on how this was accomplished are in 


Appendix C. 
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Drag Up 
Drag Upper Left 


Drag Upper Right 


Faster 

Left 

Lower Left 

Lower Right 
Mouse Down 
Mouse Left 

Mouse Lower Left 
Mouse Lower Right 
Mouse Right 
Mouse Up 

Mouse Upper Left 
Mouse Upper Right 
Right 

Right Button Click 
Slower 

Stop 

Up 

Upper Left 

Upper Right 


Drags object/Window in the up direction 

Drags object/Window toward the upper left corner of the 
screen. 

Drags object/Window toward the upper right corner of 
the screen 

Increases the speed/rate of mouse movement 
Moves mouse cursor in the left direction 

Moves mouse cursor in the lower left direction 
Moves mouse cursor in the lower right direction 
Same as Down 

Same as Left 

Same as Lower Left 

Same as Lower Right 

Moves mouse cursor in the right direction. 
Moves mouse cursor in the up direction. 

Moves mouse cursor in the upper left direction. 
Moves mouse cursor in the upper nght direction. 
Same as Mouse Right 

Clicks the right mouse button 

Slows the speed/rate of mouse movement 

Stops mouse movement 

Same as Mouse Up 

Same as Mouse Upper Left 

Same as Mouse Upper Right 


G. SYMBOLS AND PUNCTUATION 


These commands are used to type these commonly used symbols and punctuation 


marks. This is just a partial listing. DragonDictate supports all of the ASCII symbols. 


Command 


Ampersand 
Asterisk 

At Sign 

Caret 

Open Brace 
Close Brace 
Open Bracket 
Close Bracket 
Open Paren 


Action 


Types character “&” 
Types character “*” 
Types character “@” 
Types character “~” 
Types character “{” 
Types character “}” 
Types character “[” 
Types character “}” 
Types character “(” 
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Close Paren 
Open Quote 
Close Quote 
Comma 
Dollar Sign 
Period 
Pound Sign 
Slash 
Backslash 
Pipe 

Tilde 


Types 
Types 
Types 
Types 
Types 
Types 
Types 
Types 


character “)” 
character ““” 
character 
character “,” 
character “$” 
character “.” 
character “#” 


character “/”’ 


66 99 99 


character “\” 
character “|” 
character 


Types 
Types 
Types 
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H. CORRECTION COMMANDS 


These Commands are used to correct errors. 


Command 

Cancel 

Choose 1 ... Choose 10 
Edit 1... Edit 10 
Modify Word 

OK 


Oops 
Select 1... Select 10 


Spell Mode 


Word Left 1 ...Word Left 
5 

Word Right 1 ... Word 
Right 5 


Action 


cancels current action 

Selects the numbered word from the list of possible 
words heard by DragonDictate and then returns the user 
to Dictate mode. 

Allows the user to edit the selected word. Also opens a 
list of words derived from or similar to selected word. 
Allows the user to enter the modification dialog to 
change actions performed by the command. 

Ends the current action taken by the user and returns 
them to the Dictate or Command mode, whichever mode 
from which the action was initiated. 

Starts DragonDictate correction sequence 

Selects the numbered word from the list of possible 
words heard by DragonDictate. Does not return user to 
dictate mode. Allows for multiple corrections. 

Allows user to spell word phonetically using Alpha- 
Bravo words and Listing possible words as they are 
spelled. 

Moves the Cursor left one to five words and lists possible 
alternatives. 

Moves the Cursor night one to five words and lists 
possible alternatives. 
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I. NUMBERS AND KEYS 


Commands for numbers from | through 99 are given by just saying the number. 


The same is also true for keys on the keyboard. For example to use the {Tab} key the 


user would just say “Tab.” The following list produces the rest of the number set. 


Command 


Zero 
Hundred 
Thousand 
Million 
Point 


Comma (numeric) 


Types 
Types 
Types 
Types 
Types 


Action 


character “0” 
characters “00” 
characters “000” 
characters “Q00000” 


cc 3° 


character “.”, without two spaces following as if 


used for punctuation. 


Types 


cc 29 


character “,” no space following. 


27 
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APPENDIX B. IN° EVALUATION VOCABULARY 


The following is a list of the commands used for the evaluation of IN° Voice 


Command for SPARCstation version 2.2.2. Combinations of theses commands were used 


to comprise the 114 commands. 


Command 


MICROPHONE 


Audio Tool 
Binder 


Bookmarks 
Calculator 


Calendar 


Cancel 
Clear Window 


Clock 


Close Browser 


Close Mail 
Command Tool 
Compose 


Day View 


Delete Message 


Action 


Switches microphone off and on. 

Manages Audiotool using fwmm?® mode. 
Manages Binder application using fwmm 
mode. 

Presses the “Bookmarks” button in Netscape. 
Manages Calculator application using fwmm 
mode. 

Manages Binder application using fwmm 
mode. 

Presses the “Cancel” button in Frame Maker 
Clears the editing window when composing 
mail in Mailtool. 

Manages Clock application using fwmm 
mode. 

Minimizes the Netscape browser to an icon, 
regardless of which window has the focus. 
Uses fwmm mode with embedded mouse 
movements. 

Closes the Mailtool window but does not save 
changes. 

Manages Commandtool application using 
f-wmm mode. 

Presses the “Compose” button in Mailtool to 
begin writing new mail. 

Uses captured mouse movement commands 
with fwmm mode to change Calendar tool to 
day view. 

Deletes the current message in the Mailtool 
application. Uses embedded 


8 Recall that fwmm is the windows management mode as detailed in Chapter IV, p. 35. 
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Deliver 
Done 


Exit 
File Manager 


Frame Maker 


Front 


Help 


Icon 
Icon Editor 


In3 

Info 

Load In-Box 
Lower 

Mail 

Meter 


Month View 


Netscape 
New CommandTool 
New Shell Tool 


Next Message 


{ Front:t: usr/shell mailusername} command. 
Presses the “Deliver” button in the Mailtool 
application composition window. 

Presses the “Done” button in the Mailtool 
window. Saves changes made to the in-box 
Exits Frame Maker application. 

Manages File Manager application using 
fiwmm mode. 

Uses embedded mouse movements and the 
“maker &° command to give focus to the 
Commandtool window and to start Frame 
Maker. 

Sends the current window to the front of the 
screen (or to the back of the screen if the 
window is currently at the front of the screen). 
Starts the Help dialogue for OpenWindows 3. 
Uses fwmm mode. 

Minimizes the current window to an icon. 
Manages Icon Editor application using fwmm 
mode. 

Manages IN® Voice Command application 
using fwmm mode. Does not start IN°. IN° 
must already be active. 

Presses the “Info” button in Frame Maker. 
Loads the Mailtool in-box. 

Sends the current window to a lower screen 
level. 

Manages Mailtool application using fwmm 
mode. 

Manages CPU Meter tool application using 
f-wmm mode. 

Uses captured mouse movement commands 
with f wmm mode to change Calendar tool to 
current month view. 

Manages Netscape World Wide Web Browser 
application using fwmm mode. 

Uses fexec? mode to start a new 
Commandtool window. 

Uses fexec mode to start a new Shelltool 
window. 

Views the next message in the Mailtool 
application. Uses embedded 

{ Front: t:/usr/shell/mail/username} command. 


? Recall that fexec mode is the execution mode as detailed in Chapter IV, p. 36. 
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Ok 
Open 


Open Location 


Previous Message 


Print 


Print Message 


Print Tool 
Printer 14 
Printer 2 


Quit Audio 
Quit Binder 
Quit Browser 


Quit Command Tool 


Quit Editor 
Quit Icon 
Quit Mail 
Quit Meter 
Quit Snapshot 
Quit Tapetool 
Quit Tool 
Refresh 


Reply 


Save Workspace 


scroll down 


Presses the “Ok” button when leaving Frame 
Maker 

Presses the “Open” button in Frame Maker to 
open a document. 

Presses the “File|Open location” menu 
selection in Netscape using embedded mouse 
movement commands. 

Views the previous message in the Mailtool 
application. Uses embedded 
{Front:t:/usr/shell/mail/username } command. 
Presses the “Print” button in Netscape and the 
“Ok” button to print the current document. 
Presses the “Print” dialogue menu selection in 
Mailtool using embedded mouse movement 
commands. 

Manages Printtool application using fwmm 
mode. 

Changes printer selection using fwmm mode 
and embedded mouse movement commands. 
Changes printer selection using fwmm mode 
and embedded mouse movement commands. 
Dismisses the Audiotool application window. 
Dismisses the Binder application window. 
Dismisses the Netscape browser application 
window. 

Dismisses the Commandtool application 
window. 

Dismisses the Text Editor application window. 
Dismisses the Icon Editor application window. 
Dismisses the Mailtool application window. 
Dismisses the CPU Meter application window. 
Dismisses the Snapshot application window. 
Dismisses the Tapetool application window. 
Dismisses the Shelltool application window. 
Refreshes the workspace desktop in 
OpenWindows 3. Uses embedded mouse 
movement commands. 

Presses the “Reply” button in Mailtool to start 
the composition dialogue for replying to the 
currently selected message. 

Uses captured mouse movement commands to 
save the current OpenWindows 3 workspace 
configuration. 

Uses captured mouse movement commands to 
scroll the mailtool window down one page. 
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scroll up 
Shell Tool 


Signature 


Snapshot 
Tape Tool 
Text Editor 


Today 


View Message 


Web Search 


Week View 


Year View 


Uses captured mouse movement commands to 
scroll the mailtool window up one page. 
Manages Shelltool application using fwmm 
mode. 

Adds a signature to composed mail in Mailtool 
by using captured mouse movement 
commands to select the correct menu item. 
Manages Snapshot application using fwmm 
mode. 

Manages Tapetool application using fwmm 
mode. 

Manages Text Editor application using fwmm 
mode. 

Uses captured mouse movement commands 
with f wmm mode to change Calendar tool to 
the current day view. 

Views the current message in the Mailtool 
application. Uses embedded 

{ Front:t:/usr/shell/mail/username} command. 
Presses the “Net Search” button in the 
Netscape browser. 

Uses captured mouse movement commands 
with fwmm mode to change Calendar tool to 
a current week view. 

Uses captured mouse movement commands 
with fwmm mode to change Calendar tool to 
the current year view. 


62 


APPENDIX C. USERS GUIDE FOR DRAGONDICTATE FOR WINDOWS 1.3 


This section will cover installation and setup tips as well as other useful 
suggestions not found or covered in the User’s Guide or Quick Start Manual. Some of 
the suggestions are found in the manuals, but they are not very well documented. 

A. INSTALLATION 

When installing DragonDictate, the user is given the opportunity to either install 
everything or to install just the required files for a single user. I have found that it is 
better to install everything. Installing everything makes it a lot easier to add more users 
for the application. If you do not install everything, the software will request that you 
insert disk number five of the installation diskette set. Unless you happen to have this 
particular diskette handy (which I did not), then you must find the disk and insert it into 
the requested drive. This can be avoided by choosing to initially install everything. Doing 
so will make adding new users a snap. Whenever you wish to add a new user, the 
program will accomplish the necessary steps and the dialogue for creating a new user will 
begin after about 20 seconds. 

B. TRAINING 

During the initial training, DragonDictates’ training level is set at the default level 
which is light. This level enables the user to complete training with minimal time 
expended, but it offers the least amount of initial accuracy. This level only requires that 
the user repeat the word three times, whether the word recognized or not. It is 
recommended to set the level of training at intense initially. The intense level requires 
more repetitions of each word to be uttered by the user, but it offers the highest level of 
initial accuracy. The user is prompted to utter the command six times, and three more 
times if there is an error in the recognition of the word during the initial six utterances. 


This level setting takes a bit longer (about 45 minutes total time will be spent training), 
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but the improved accuracy and time not spent correcting errors is worth the extra time. 
Improvements in voice recognition speaker-independent models used in DragonDictate 
version 2.0 make initial training optional. The recognition of words is performed 
immediately after installation. 
C: ADDING NON-SUPPORTED WINDOWS APPLICATIONS 

Adding non-supported applications to the can be accomplished in three ways: 1) 
dragging the application icon to the DragonDictate For Windows group and dropping it, 
2) adding the group using the Program Manager “File|New|Program Item” menu selection 
while the DragonDictate for Windows group is open, or 3) by copying a program item 
from one group to the DragonDictate For Windows group using the Program Manager 
“Copy Program item” menu selection and dialogue. (Figure 20). In either case you will 
have to possibly rename the program icon using the Program Manager “File|Properties” 
menu selection. As shown in Figure 20, It would be preferable to rename “Weudora” to 


“Eudora.” 


Ire 
‘he ; Send? ga? er Psy 


Gr ees are eae 


: Copy S aan Item: Weudora 
| From Program Group: Network Stuff 


‘ to DE EeUr- 





On er enn "ae 39 af 9 


Figure 21. Copy Program Item dialogue window!” 


After copying the application it will be necessary to train the non-supported 
application command. This is accomplished by opening DragonDictates vocabulary 
manager and choosing the “Find Word” button. The word in this case will be 


“{Eudora]”. All commands in DragonDictate are enclosed in brackets. Once the 


10 This dialogue is opened by choosing the “File|Copy” menu selection from the Windows Program 
Manager main window menu. 
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command is located, then click on the train command button. This will begin the training 
process for the non-supported application command. Once this is completed you will be 
able to use the “[Bring Up]” command to start the non-supported application by voice. 
All of the vocabulary for the menu items in the non-supported application can also be 
accessed by voice. DragonDictate will be able to track these automatically. 
D. ADDING VOCABULARY FOR UPGRADED APPLICATIONS 

Adding vocabulary for upgraded applications is very simple, though the User’s 
Guide does not address this problem. For our example let us use an upgrade from 
WordPerfect 6.0 (there is an existing default vocabulary installed with the program) to 
Wordperfect 6.1 (there is no supporting vocabulary for this application). When 
Wordperfect 6.1 is opened using DragonDictate a vocabulary called WPWin 6.1 is added 
to the list of vocabularies in the vocabulary manger. Simply use the vocabulary manager 
to export this vocabulary as a text file. | This is done by using the import/export 
vocabulary method described in the User’s Guide. Simply name the file “WP61.txt”. 
Export the WordPerfect 6.0 vocabulary and call it “WP60.txt.” Open “WP60.txt” using 
Notepad, or any other text editor, and copy the entire document with the exception of the 
first two lines. Next, open “WP61.txt” and paste the text copied from the first document 
into this document after the two lines that are already present in “WP61.txt”. Close both 
documents. Use vocabulary manager to import “WP61.txt” back into DragonDictate. 
Now all of the voice commands from WordPerfect 6.0 are available to WordPerfect 6.1. 
E. CREATING NEW COMMANDS 

Creating new commands is very simple in DragonDictate. Using the method 
described in the User’s Guide, it is possible to develop custom commands. The 
Commands “{Scratch 2]” and others were created by modifying the command “{Scratch 
That]’. By copying and pasting the resulting action from “[Scratch That]” it was possible 


to create commands to delete multiple words. By changing the “Resulting Action” text to 
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include the line “ReyectPreviousWord 1” (Figure 21) multiple times or by adding a 2, 3, 4, 


or 5 instead of a 1, it is possible to create commands to delete multiple words. 


ModifyWord ———s xi 


phen 
- * Word Name: [ok | 
pec eet 
Vocabulaty ? Group: Train Word... | 


Dictation Advanced... | 


momen Resulting Action - — Help 
| © Type Following Keystrokes | 
( Execute ae Sonphs 


RN Ne aR A Bee PBT hn Oe ron 


Edit Tools 


RejectPreviousWord 1 
RejectPreviousWord 1 








Figure 22. Adding a new command “[{Scratch 2}.” 


By inserting keystrokes instead of scripts it is possible to add other commands. 
The commands “[Back]”’, “[Forward]’, and “[Reload]’ were created for Netscape 
Navigator using this method (Figure 22). The dialogue window ts able to capture the 
required keystrokes by choosing the “Tools|Capture Keystrokes” menu selection in the 
“Resulting Action” box captures the keystrokes that are performed by the user. The 


keystrokes are then transferred to the resulting action box (Figure 23). Using these two 
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methods allows the user to add custom commands to augment the default commands for 
applications, and to add to the command vocabulary for non-supported applications. This 
allows DragonDictates 30,000 word vocabulary to be tailored to fit the requirements of 
the user. The vocabulary does not expand or increase. Words that are not used are 
simply dropped out of the vocabulary to make room for the new words. 


Capture Keystrokes ez 


OB AGON 





Press the keys exactly as you want 
DragonDictate to send them to your 
application. 


{Cte ExtLeft 








Press the Control. Shift, of AR key by self to 
stop recording (or chck OK} 


a OK | Cancel | Help — 


Figure 23. Capturing Keystrokes 
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APPENDIX D. DICTATION TEST PARAGRAPH 


The following passage from “Of the Standard of Taste” by David Hume was used as a 


control to measure the accuracy and learning capacity of DragonDictate [Ref. 17: p. 210]: 


The great resemblance between mental and bodily taste will easily teach us to apply this 
story. Though it be certain, that beauty and deformity, more than sweet and bitter, are not 
qualities in objects, but belong entirely to the sentiment, internal or external; it must be 
allowed, that there are certain qualities m objects, which are fitted by nature to produce those 
particular feelings. Now as these qualities may be found in a small degree, or may be mixed 
and confounded with each other, it often happens that the taste is not affected with such minute 
qualities, or is not able to distinguish all the particular flavours, amidst the disorder in which 
they are presented. Where the organs are so fine, as to allow nothing to escape them; and at 
the same time so exact, as to perceive every ingredient in the composition: This we call 
delicacy of taste, whether we employ these terms in the literal or metaphorical sense. Here 
then the general rules of beauty are of use, being drawn from established models, and from the 
observation of what pleases or displeases, when presented singly and in a high degree: And if 
the same qualities, in a continued composition, and in a smaller degree, affect not the organs 
with a sensible delight or uneasiness, we exclude the person from all pretensions to this 
delicacy. To produce these general rules or avowed pattems of composition, is like finding the 
key with the leathern thong; which justified the verdict of Sancho’s kinsmen, and confounded 
those pretended judges who had condemned them. Though the hogshead had never been 
emptied, the taste of the one was still equally delicate, and that of the other equally dull and 
languid: But it would have been more difficult to have proved the superiority of the former, to 
the conviction of every bye-stander.... 
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